Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
4: In readChar(file, size, TRUE) : truncating string with embedded nuls
5: In readChar(file, size, TRUE) : truncating string with embedded nuls
6: In readChar(file, size, TRUE) : truncating string with embedded nuls
7: In readChar(file, size, TRUE) : truncating string with embedded nuls
8: In readChar(file, size, TRUE) : truncating string with embedded nuls
9: In readChar(file, size, TRUE) : truncating string with embedded nuls
10: In readChar(file, size, TRUE) : truncating string with embedded nuls
The purpose of this notebook is to give data locations, data ingestion code, and code for rudimentary analysis and visualization of COVID-19 data provided by New York Times, [NYT1].
The following steps are taken:
Ingest data
Take COVID-19 data from The New York Times, based on reports from state and local health agencies, [NYT1].
Take USA counties records data (FIPS codes, geo-coordinates, populations), [WRI1].
Merge the data.
Make data summaries and related plots.
Make corresponding geo-plots.
Note that other, older repositories with COVID-19 data exist, like, [JH1, VK1].
Remark: The time series section is done for illustration purposes only. The forecasts there should not be taken seriously.
From the help of tolower:
capwords <- function(s, strict = FALSE) {
cap <- function(s) paste(toupper(substring(s, 1, 1)),
{s <- substring(s, 2); if(strict) tolower(s) else s},
sep = "", collapse = " " )
sapply(strsplit(s, split = " "), cap, USE.NAMES = !is.null(names(s)))
}
if( !exists("dfNYDataStates") ) {
dfNYDataStates <- read.csv( "https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv",
colClasses = c("character", "character", "character", "integer", "integer"),
stringsAsFactors = FALSE )
colnames(dfNYDataStates) <- capwords(colnames(dfNYDataStates))
}
head(dfNYDataStates)
dfNYDataStates$DateObject <- as.POSIXct(dfNYDataStates$Date)
summary(as.data.frame(unclass(dfNYDataStates), stringsAsFactors = TRUE))
Date State Fips Cases Deaths DateObject
2020-03-28: 55 Washington : 127 53 : 127 Min. : 1 Min. : 0.0 Min. :2020-01-21 00:00:00
2020-03-29: 55 Illinois : 124 17 : 124 1st Qu.: 116 1st Qu.: 2.0 1st Qu.:2020-03-23 00:00:00
2020-03-30: 55 California : 123 06 : 123 Median : 1638 Median : 44.0 Median :2020-04-14 00:00:00
2020-03-31: 55 Arizona : 122 04 : 122 Mean : 12235 Mean : 680.6 Mean :2020-04-12 17:08:18
2020-04-01: 55 Massachusetts: 116 25 : 116 3rd Qu.: 8942 3rd Qu.: 322.0 3rd Qu.:2020-05-05 00:00:00
2020-04-02: 55 Wisconsin : 112 55 : 112 Max. :368669 Max. :29241.0 Max. :2020-05-26 00:00:00
(Other) :4359 (Other) :3965 (Other):3965
Summary by state:
by( data = as.data.frame(unclass(dfNYDataStates)), INDICES = dfNYDataStates$State, FUN = summary )
Alternative summary:
Hmisc::describe(dfNYDataStates)
if(!exists("dfNYDataCounties") ) {
dfNYDataCounties <- read.csv( "https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv",
colClasses = c("character", "character", "character", "character", "integer", "integer"),
stringsAsFactors = FALSE )
colnames(dfNYDataCounties) <- capwords(colnames(dfNYDataCounties))
}
head(dfNYDataCounties)
dfNYDataCounties$DateObject <- as.POSIXct(dfNYDataCounties$Date)
summary(as.data.frame(unclass(dfNYDataCounties)))
Date County State Fips Cases Deaths DateObject
Length:179696 Length:179696 Length:179696 Length:179696 Min. : 0.0 Min. : 0.00 Min. :2020-01-21 00:00:00
Class :character Class :character Class :character Class :character 1st Qu.: 4.0 1st Qu.: 0.00 1st Qu.:2020-04-09 00:00:00
Mode :character Mode :character Mode :character Mode :character Median : 16.0 Median : 0.00 Median :2020-04-26 00:00:00
Mean : 319.3 Mean : 17.75 Mean :2020-04-24 12:34:45
3rd Qu.: 76.0 3rd Qu.: 2.00 3rd Qu.:2020-05-11 00:00:00
Max. :204111.0 Max. :20795.00 Max. :2020-05-26 00:00:00
if(!exists("dfUSACountyData")){
dfUSACountyData <- read.csv( "https://raw.githubusercontent.com/antononcube/SystemModeling/master/Data/dfUSACountyRecords.csv",
colClasses = c("character", "character", "character", "character", "integer", "numeric", "numeric"),
stringsAsFactors = FALSE )
}
head(dfUSACountyData)
summary(as.data.frame(unclass(dfUSACountyData), stringsAsFactors = TRUE))
Country State County FIPS Population Lat Lon
UnitedStates:3143 Texas : 254 WashingtonCounty: 30 01001 : 1 Min. : 89 Min. :19.60 Min. :-166.90
Georgia : 159 JeffersonCounty : 25 01003 : 1 1st Qu.: 10980 1st Qu.:34.70 1st Qu.: -98.23
Virginia: 134 FranklinCounty : 24 01005 : 1 Median : 25690 Median :38.37 Median : -90.40
Kentucky: 120 JacksonCounty : 23 01007 : 1 Mean : 102248 Mean :38.46 Mean : -92.28
Missouri: 115 LincolnCounty : 23 01009 : 1 3rd Qu.: 67507 3rd Qu.:41.81 3rd Qu.: -83.43
Kansas : 105 MadisonCounty : 19 01011 : 1 Max. :10170292 Max. :69.30 Max. : -67.63
(Other) :2256 (Other) :2999 (Other):3137
dsNYDataCountiesExtended <-
dfNYDataCounties %>%
dplyr::inner_join( dfUSACountyData %>% dplyr::select_at( .vars = c("FIPS", "Lat", "Lon", "Population") ), by = c( "Fips" = "FIPS" ) )
dsNYDataCountiesExtended
ParetoPlotForColumns( dsNYDataCountiesExtended, c("Cases", "Deaths"), scales = "free" )
Note that in the plots in this sub-section we filter out Hawaii and Alaska.
ggplot2::ggplot(dsNYDataCountiesExtended[ dsNYDataCountiesExtended$Lon > -130, c("Lat", "Lon", "Cases")]) +
ggplot2::geom_point( ggplot2::aes(x = Lon, y = Lat, fill = log10(Cases)), alpha = 0.01, size = 0.5, color = "blue" ) +
ggplot2::coord_quickmap()
cf <- colorBin( palette = "Reds", domain = log10(dsNYDataCountiesExtended$Cases), bins = 10 )
m <-
leaflet( dsNYDataCountiesExtended[, c("Lat", "Lon", "Cases")] ) %>%
addTiles() %>%
addCircleMarkers( ~Lon, ~Lat, radius = ~ log10(Cases), fillColor = ~ cf(log10(Cases)), color = ~ cf(log10(Cases)), fillOpacity = 0.8, stroke = FALSE, popup = ~Cases )
n too large, allowed maximum for palette Reds is 9
Returning the palette you asked for with that many colors
n too large, allowed maximum for palette Reds is 9
Returning the palette you asked for with that many colors
m